
**************************************************************
* Create a dataset with 2010 election participation
**************************************************************

clear all
odbc load, table("riksdagsval_2010") connectionstring("DRIVER={SQL Server};SERVER={mq02\b};DATABASE={P0846};Trusted_Connection={Yes}")

* Coding:
* "Siffrorna svarar mot de olika symbolerna i röstlängden enligt följande" 
* 1 = tom ruta
* 2 = / (röstat på plats)
* 3 = fyrkant (ej rösträtt)
* 4 = P (poströst)
* 5 = V (sen poströst)
* 6 = ? (röstat men oklart om rösten är /, P eller V)
* Create variable for whether an individual voted in the 2010 national election
tab r, miss
drop if r == 3
* Drop variables for voter participation in municipality and region 
drop l k 
tab r, miss
gen Voted2010 = r != 1
tab Voted2010, miss
drop r

* Some individuals occur multiple times in data
* i) Keep only one case if they are identical on all variables
duplicates drop
* ii) Still 1 duplicate left (wrt P0846_LopNr_PersonNr only).
*  Keep the case where s/he voted
duplicates tag P0846_LopNr_PersonNr, gen(multiple_personid)
tab multiple_personid, miss
sort multiple_personid P0846_LopNr_PersonNr
egen max_Rrost = max(Voted2010), by(P0846_LopNr_PersonNr)
keep if max_Rrost == Voted2010
keep P0846_LopNr_PersonNr Voted2010
duplicates drop
 
* Use a better variable name
rename P0846_LopNr_PersonNr PersonId

* Save voting data 2010
compress
save "Data/data_voting_2010", replace

